performance evaluation
- North America > United States > Massachusetts (0.04)
- North America > United States > Florida > Broward County (0.04)
- Asia > Middle East > Israel (0.04)
A Survey on Inference Engines for Large Language Models: Perspectives on Optimization and Efficiency
Park, Sihyeong, Jeon, Sungryeol, Lee, Chaelyn, Jeon, Seokhun, Kim, Byung-Soo, Lee, Jemin
Large language models (LLMs) are widely applied in chatbots, code generators, and search engines. Workload such as chain-of-throught, complex reasoning, agent services significantly increase the inference cost by invoke the model repeatedly. Optimization methods such as parallelism, compression, and caching have been adopted to reduce costs, but the diverse service requirements make it hard to select the right method. Recently, specialized LLM inference engines have emerged as a key component for integrating the optimization methods into service-oriented infrastructures. However, a systematic study on inference engines is still lacking.This paper provides a comprehensive evaluation of 25 open-source and commercial inference engines. We examine each inference engine in terms of ease-of-use, ease-of-deployment, general-purpose support, scalability, and suitability for throughput- and latency-aware computation. Furthermore, we explore the design goals of each inference engine by investigating the optimization techniques it supports. In addition, we assess the ecosystem maturity of open source inference engines and handle the performance and cost policy of commercial solutions.We outline future research directions that include support for complex LLM-based services, support of various hardware, and enhanced security, offering practical guidance to researchers and developers in selecting and designing optimized LLM inference engines. We also provide a public repository to continually track developments in this fast-evolving field: \href{https://github.com/sihyeong/Awesome-LLM-Inference-Engine}{https://github.com/sihyeong/Awesome-LLM-Inference-Engine}.
- Asia > South Korea > Seoul > Seoul (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Gulf of Mexico > Central GOM (0.04)
- (4 more...)
- Research Report > New Finding (1.00)
- Overview (1.00)
- Workflow (0.92)
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (0.92)
- Information Technology > Services (0.67)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Performance Evaluation of Bitstring Representations in a Linear Genetic Programming Framework
Meli, Clyde, Nezval, Vitezslav, Oplatkova, Zuzana Kominkova, Buttigieg, Victor, Staines, Anthony Spiteri
Different bitstring representations can yield varying computational performance. This work compares three bitstring implementations in C++: std::bitset, boost::dynamic_bitset, and a custom direct implementation. Their performance is benchmarked in the context of concatenation within a Linear Genetic Programming system. Benchmarks were conducted on three platforms (macOS, Linux, and Windows MSYS2) to assess platform specific performance variations. The results show that the custom direct implementation delivers the fastest performance on Linux and Windows, while std::bitset performs best on macOS. Although consistently slower, boost::dynamic_bitset remains a viable and flexible option. These findings highlight the influence of compiler optimisations and system architecture on performance, providing practical guidance for selecting the optimal method based on platform and application requirements.
- Europe > Middle East > Malta (0.06)
- Europe > Czechia (0.05)
- North America > United States > New York > New York County > New York City (0.04)
Performance Evaluation of Ising and QUBO Variable Encodings in Boltzmann Machine Learning
Hasegawa, Yasushi, Ohzeki, Masayuki
We compare Ising ({-1,+1}) and QUBO ({0,1}) encodings for Boltzmann machine learning under a controlled protocol that fixes the model, sampler, and step size. Exploiting the identity that the Fisher information matrix (FIM) equals the covariance of sufficient statistics, we visualize empirical moments from model samples and reveal systematic, representation-dependent differences. QUBO induces larger cross terms between first- and second-order statistics, creating more small-eigenvalue directions in the FIM and lowering spectral entropy. This ill-conditioning explains slower convergence under stochastic gradient descent (SGD). In contrast, natural gradient descent (NGD)-which rescales updates by the FIM metric-achieves similar convergence across encodings due to reparameterization invariance. Practically, for SGD-based training, the Ising encoding provides more isotropic curvature and faster convergence; for QUBO, centering/scaling or NGD-style preconditioning mitigates curvature pathologies. These results clarify how representation shapes information geometry and finite-time learning dynamics in Boltzmann machines and yield actionable guidelines for variable encoding and preprocessing.
- Asia > Japan > Kyūshū & Okinawa > Kyūshū > Kumamoto Prefecture > Kumamoto (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Honshū > Tōhoku > Miyagi Prefecture > Sendai (0.04)
- North America > United States > Massachusetts (0.04)
- North America > United States > Florida > Broward County (0.04)
- Asia > Middle East > Israel (0.04)
- Information Technology > Security & Privacy (1.00)
- Government > Military (0.69)